AITopics | semantic layout

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Neural Information Processing SystemsMar-16-2026, 21:27:27 GMT

Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation of natural image manifold through color strokes, key-points, textures, and holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ structured semantic layout as our intermediate representations for manipulation. Initialized with coarse-level bounding boxes, our layout generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations. Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively. Benefits of the hierarchical framework are further demonstrated in applications such as semantic object manipulation, interactive image editing, and data-driven image manipulation.

artificial intelligence, manipulation, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.78)

Add feedback

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, hongsheng Li

Neural Information Processing SystemsFeb-13-2026, 16:56:05 GMT

On the other hand, we rethink the functionality of convolutional layers for image synthesis.

artificial intelligence, discriminator, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Seunghoon Hong, Xinchen Yan, Thomas S. Huang, Honglak Lee

Neural Information Processing SystemsFeb-12-2026, 22:38:01 GMT

Neural Information Processing Systems http://nips.cc/

image manipulation, layout, manipulation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Neural Information Processing SystemsDec-25-2025, 21:40:51 GMT

Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images. In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images. Moreover, we propose a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts than previous multi-scale discriminators. We achieve state-of-the-art results on both quantitative metrics and subjective evaluation on various semantic segmentation datasets, demonstrating the effectiveness of our approach.

name change, predict layout-to-image conditional convolution, semantic image synthesis, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Neural Information Processing SystemsNov-20-2025, 22:18:04 GMT

Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation of natural image manifold through color strokes, key-points, textures, and holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ structured semantic layout as our intermediate representations for manipulation. Initialized with coarse-level bounding boxes, our layout generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations. Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively. Benefits of the hierarchical framework are further demonstrated in applications such as semantic object manipulation, interactive image editing, and data-driven image manipulation.

learning hierarchical semantic image manipulation, name change, structured representation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.78)

Add feedback

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Seunghoon Hong, Xinchen Yan, Thomas S. Huang, Honglak Lee

Neural Information Processing SystemsNov-20-2025, 16:49:14 GMT

Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively.

artificial intelligence, machine learning, manipulation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, hongsheng Li

Neural Information Processing SystemsAug-19-2025, 23:48:22 GMT

State-of-the-art methods are mostly based on Generative Adversarial Networks (GAN).

discriminator, label map, semantic layout, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
North America > Canada (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model

Zang, Qi, Yang, Jiayi, Wang, Shuang, Zhao, Dong, Yi, Wenjun, Zhong, Zhun

arXiv.org Artificial IntelligenceDec-19-2024

Data-driven deep learning models have enabled tremendous progress in change detection (CD) with the support of pixel-level annotations. However, collecting diverse data and manually annotating them is costly, laborious, and knowledge-intensive. Existing generative methods for CD data synthesis show competitive potential in addressing this issue but still face the following limitations: 1) difficulty in flexibly controlling change events, 2) dependence on additional data to train the data generators, 3) focus on specific change detection tasks. To this end, this paper focuses on the semantic CD (SCD) task and develops a multi-temporal SCD data generator ChangeDiff by exploring powerful diffusion models. ChangeDiff innovatively generates change data in two steps: first, it uses text prompts and a text-to-layout (T2L) model to create continuous layouts, and then it employs layout-to-image (L2I) to convert these layouts into images. Specifically, we propose multi-class distribution-guided text prompts (MCDG-TP), allowing for layouts to be generated flexibly through controllable classes and their corresponding ratios. Subsequently, to generalize the T2L model to the proposed MCDG-TP, a class distribution refinement loss is further designed as training supervision. %For the former, a multi-classdistribution-guided text prompt (MCDG-TP) is proposed to complement via controllable classes and ratios. To generalize the text-to-image diffusion model to the proposed MCDG-TP, a class distribution refinement loss is designed as training supervision. For the latter, MCDG-TP in three modes is proposed to synthesize new layout masks from various texts. Our generated data shows significant progress in temporal continuity, spatial diversity, and quality realism, empowering change detectors with accuracy and transferability. The code is available at https://github.com/DZhaoXd/ChangeDiff

artificial intelligence, changediff, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.15541

Country:

North America > Honduras (0.04)
North America > Central America (0.04)
Europe > Slovakia (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Neural Information Processing SystemsOct-10-2024, 18:43:53 GMT

Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images. In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images. Moreover, we propose a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts than previous multi-scale discriminators.

predict layout-to-image conditional convolution, semantic image synthesis, semantic layout, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Vision (0.65)

Add feedback

Referenceless User Controllable Semantic Image Synthesis

Kim, Jonghyun, Li, Gen, Kim, Joongkyu

arXiv.org Artificial IntelligenceJun-18-2023

Despite recent progress in semantic image synthesis, complete control over image style remains a challenging problem. Existing methods require reference images to feed style information into semantic layouts, which indicates that the style is constrained by the given image. In this paper, we propose a model named RUCGAN for user controllable semantic image synthesis, which utilizes a singular color to represent the style of a specific semantic region. The proposed network achieves reference-free semantic image synthesis by injecting color as user-desired styles into each semantic layout, and is able to synthesize semantic images with unusual colors. Extensive experimental results on various challenging datasets show that the proposed method outperforms existing methods, and we further provide an interactive UI to demonstrate the advantage of our approach for style controllability.

artificial intelligence, machine learning, semantic region, (18 more...)

arXiv.org Artificial Intelligence

2306.10646

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report (0.51)

Industry: Information Technology (0.31)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

semantic layout

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

Referenceless User Controllable Semantic Image Synthesis